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Estimation  in  the  Presence  of  Noise  of  a  Signal  Which  is 
Flat  Except  for  Jumps  -  Part  I,  A  Bayesian  Study 

Abstract 

'v 

Consider  the  problem  of  estimating*  in  a  Bayesian 
framework  and  in  the  presence  of  additive  Gaussian  noise* 
a  signal  which  is  a  random  step  function.  The  best  linear 
estimates*  the  Bayes  estimates  and  the  estimates  with  known 
change  points  are  derived,  evaluated  and  compared 
analytically  and  numerically.  A  characterization  of  the 
Bayes  estimates  is  presented.  This  characterization  has  a 
reasonable  interpretation  and  also  provides  a  way  to 
compute  the  Bayes  estimates  with  a  number  of  operations  of 
the  order  of  where  T  is  the  fixed  time  span-  An 

approximation  to  the  Sayes  estimates  is  proposed  which  is 
reasonably  good  and  reduces  the  total  number  of  operations 
to  the  order  of  T. 

Key  words:  Change  points,  nonlinear  filtering,  anoothing,  Bayesian  infer¬ 
ence. 

AMS  1980  subject  classification:  Prineury  62)G0,93E14;  Secondary:  62G0S, 
93E11,  62F15 


1.  Introduction 

we  consider  the  problem  of  estimating,  in  a  Bayesian 
framework,  a  signal  which  is  a  step  function  when  one 
observes  the  signal  plus  Gaussian  noise.  Optimal  linear 
and  nonlinear  estimates  are  derived  and  compared. 

This  problem  is  a  simplified  version  of  a  more  general 
one,  applications  of  which  appear  in  many  fields  where  the 
unknown  underlying  structure  is  a  function,  of  one  or 
more  varie’'*-*5,  which  is  discontinuous  or  has  discontinuous 
derivatives.  Several  examples  follow. 

(A)  In  seismology,  the  density  of  the  sedimentary 
layers  of  the  earth's  crust  can  be  locally  approximated 
by  a  piecewise  constant  function. 

(B)  In  tomography,  the  density  of  the  contents  of 
the  head  may  have  discontinuities  due  to  tumors  as  well 
as  those  due  to  the  skull. 

(C)  In  image  processing,  the  light  intensity  of  a 
picture  changes  from  object  to  object. 

(D)  In  econometric  modelling,  ABHA  processes  are 
commonly  used  and  their  parameters  may  be  subjected  to 
changes  due  to  sudden  shifts  of  governmental  policies  and 
international  relationships. 

(E)  In  regression  analysis,  regression  curves  oisy  be 
made  up  of  broken  straight  lines. 

(F)  In  tracking  problem,  a  target  may  be  liable  to 


make  sudden  changes  in  direction. 
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In  the  above  cases,  it  is  desired  to  estimate  the 
signal  processes,  i.e.  the  density  functions,  the 
intensity  of  light,  the  parameters  of  ARMA.  processes, 
the  regression  curves  and  the  path  of  the  moving  target. 
They  can  be  measured  either  directly  with  measurement 
error  (in  (C) ,  (E))  or  indirectly  through  various  trans¬ 
formations  (in  (A),  (B/ ,  (D) ,  and  (F)).  There  are  two 
important  and  relevant  problems: 

(1)  Can  one  estimate  such  signals  efficiently? 

(2)  Can  one  detect  whether  or  when  (or  where)  a 
process  changes  its  character? 

The  second  problem  is  particularly  interesting  in  quality 
control,  and  in  the  engineering  literature  it  is  called 
detection.  Paradoxically,  the  first  problem  is  called 
smoothing  when  it  is  applied  to  smooth  signals.  Smoothing 
is  used  in  contrast  to  filtering  where  the  estimate  of  the 
signal  at  time  t  is  based  on  the  observed  process  only 
up  to  time  t  and  not  beyond. 

We  shall  restrict  ourselves  to  the  simple  case  where 

the  signal  processes  are  flat  except  for  jumps  and  can  be 

measured  directly,  in  other  words,  in  discrete  time  denote 

the  signal  process  by  and  let 

except  for  occasional  changes.  Let  the  observations 

X  -  v»  ♦€,n*l,  2,  ...,T  where  the  c  are  measure- 
n  n  n  n 

ment  noise.  We  shall  concentrate  on  estimating  the  signal 
process  and  pay  little  attention  to  detecting  change 
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points. 

If  the  change  points  were  known,  we  could  estimate 
by  the  average  of  the  data  points  between  the  two 
surrounding  change  points.  If  jumps  are  not  large,  it  is 
hard  to  tell  when  jumps  take  place  and  to  take  appropriate 
action.  Moreover,  if  measurement  noise  has  a  heavy¬ 
tailed  distribution,  outliers  may  be  disguised  as  jumps. 

in  order  to  develop  insight  for  estimating  the 
signal  from  the  observations,  we  take  a  Bayesian  point  of 
view  and  consider  a  simple  model.  To  be  specific,  we 
will  characterize  the  underlying  problem  through  the 
following  special  assumptions,  which  form  the  discrete 
time  version  of  a  model  of  Duncan  (See  Barnard  (1959) , 
p.  255) . 

(1)  The  sequence  of  the  change  points  forms  a 
discrete  renewal  process  with  identically  geometrically 
distributed  interarrival  times. 

(2)  The  distinct  heights  of  the  singal  are  mutually 
independent  from  a  coimon  Gaussian  distribution. 

(3)  The  measurement  noise  is  Gaussian  white  noise. 

Barnard  (1959)  and  Chernoff  and  Sacks  (1964)  studied 

a  similar  model  where  the  number  of  operations  required 
to  compute  the  Bayes  solution  is  of  the  order  of  2^. 

Here  T  is  the  fixed  time  span.  In  contrast,  we  will 
see  that  in  our  case  the  Bayes  solution  can  be  cofiq>uted 
with  a  number  of  operations  of  the  order  of  T^. 


% 


* 
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With  respect  to  the  above  three  assumptions,  some 
basic  questions  arise; 

(Qa)  How  well  can  we  do  if  we  know  the  change 
points? 

(Qb)  How  well  can  we  do  if  we  use  the  best  'linear 
estimate? 

(Qc)  How  well  can  we  do  with  the  best  nonlinear 
estimate? 

(Odl)  If  the  parameters  are  not  known,  can  we  estimate 
them  from  the  empirical  data? 

(Qe)  What  if  the  model  is  not  satisfied? 

(Of)  What  about  the  analogous  continuous  time 
problem? 

The  first  three  of  these  questions  are  studied  in 
great  detail  here.  Results  on  the  others  will  be 
presented  in  a  forthcoming  report,  in  Section  2,  the 
Bayesian  model  is  formulated  more  precisely.  In  Section 
3,  the  minimum  variance  linear  estimates  of  the  signal 
are  derived  and  then  average  mean  squared  error  is 
expressed  In  a  closed  form.  In  Section  4,  a  characterize- 
tlon  of  the  Bayes  solution  is  presented  which  has  a 
reasonable  interpretation,  in  Section  5,  a  good  approxima¬ 
tion  to  the  Bayes  solution  is  proposed.  In  Section  6,  the 
estimates  based  on  the  additional  knowledge  of  the  change 
points  are  considered  and  their  asymptotic  average  mean 
squared  error  as  T  »  Is  found,  in  Section  7,  four 
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types  of  estimates  are  evaluated  and  compared.  Finally, 
there  is  an  appendix  where  detailed  derivations  are 
presented  for  some  of  the  less  obvious  results  presented 
in  the  previous  five  sections. 

2^  The  Special  Bayesian  Model 

The  three  assumptions  of  the  model  are  described 
more  precisely  below. 

(1)  Let  J  *  rJj f . . . * be  a  Bernoulli 
sequence  indicating  when  changes  take  place,  i.e. 

(2.1)  •  1  if  there  is  a  change  between  n  and  n^l, 

■  0  otherwise. 


where  Pr(J  •!)•?,  for  1  <  n  <  T-1.  For  convenience, 
n  “ 

define  Jj,  =  =  1. 

<2)  Let  Yj.Vj . ’'t  **  X(9,  o*).  Define 

the  signal  process  recursively  as  follows. 


(2.2)  y,  •  Y, 


(3)  Let  the  observation  process  {X  }  be  given  by 


(2.3) 


where 


f,(X)  -  a(X)  (u^(X))^  +  b(X)  (u.(X))*, 

Uj(X)  -  tl-  B^-Xa+  p^)J  /(l-O^-X  (l+s^))^-4c^X^|/2, 

a(X)  -  l(l-X)^  -0^  -  (l-X)u.l/{u^^-u^u.) ,  . 

b(X)  -  ((l-X)u^  -  (1-X)^  +  c^l/(u^u_  -  u.^). 

4,  Th«  Bayes  Solution  -  the  Miniwum  Variance  Nonlinear 
Estimates. 

The  Bayes  solution  can  be  computed  by  brute  force  with 
a  number  of  operations  of  the  order  of  2*^.  In  this 
section »  we  present  a  characterization  which  has  a  reason** 
able  interpretation  and  also  provides  a  way  to  compute  the 
solution  with  O(T^)  operations. 

In  the  following,  we  consider  the  conditional  dist¬ 
ributions  of  based  on  (1)  the  past  and  present  data, 

(2)  the  future  data,  i (u^l . ,X^) , 
and  (3)  all  of  the  data,  t  (u^^l  Xj^,X2 , . . ,  ,X^) .  We  will 

see  that  t  (Uj^l  X^, . . .  ,X^)  and  l-(y^|X^^^ . X^)  can 

be  computed  recursively  and  1 (y^f X^,X2, • • • ,X^)  can  be 
computed  by  use  of  L (u^ I X^ , . . , ,X^)  and 
Here  axe  convenient  notations: 

(1)  xj  = 

n 

(2)  S.  .0,  S_=  I  X.  (cumulative  euiu) . 

o  ”  k-l  * 


(3)  L(Y)  = the  distribution  of  random  variable  Y. 

(4)  f  (zlx^)  s  the  conditional  probability  density 

^n  ^ 

of  at  2  given  . 

(5)  "f(x,z)«  g(x,z)  in  z*  means  that  there  exists 
c{x)  such  that  f(x,z)  ■  c(x)g(x,z)  for  all  x,z. 

4.1  An  Expression  for  Ku^^lx^). 

Proposition  4.1 

£„  (l|x"*^)«  »  (X..,-I) (tl-p)f^  (*|x?)+pf  (*)) 

n+l  *  “n  “ 

in  z  ,  1  ^  <  T-1. 

where  4  is  the  standard  normal  density  and 

(4.1)  fjj(x)  -  (2.0*)’^''*  exp(-x*/27*) 

is  the  density  of  the  prior  signal  distribution 

The  proof  of  Proposition  4.1  appears  in  Appendix  A.l 
and  it  is  a  simple  application  of  Bayes'  theorem.  This 
proposition  is  an  undating  formula  for  computing 

n*lr2,...,T.  Since  Hu^l  Xj^>  -  N  , 

(140*^)'^),  we  can  demonstrate  by  use  of  Induction  and 


Proposition  4.1 


« 
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^roposition  4.2 


(n) 


Remarks ; 


(1)  Since 

..-J  -OJ.J 

[^n-^n-k  1  1 

n  n-k  1 

k+o'^  '  k  +o"^ 

1 

^  k+1?  k+o  J 

one  may  readily  see  from  Proposition  4.2  that 


where 

-  •••  ■  Vi  ■  “I*?’  “ 

conditional  probability  distribution  of  the  last  change 
point  before  tine  n. 


,(n) 


P(l-P)' 


2 (k4c"^7 


(n-l,2,,..,T;  k*l,...,n) 


(2)  Elii^lxV)  - 


,  (n) 


k-l 


^)  »  J  a 
1-1 


(n). 


Ojj  (n*l/2, . . .  ,T  *■  1)  are  defined  recursively  by 
Oj  •  1,  and 


n+1  n-k+l 


P(t-P) 


k-1 


exp 


,n 


where 

"  I  A*”Vu+o“^). 

*  K-n-t+1 

Obviously,  0  <  aj**^  <  aj”^  <  . , ,  <  a^***  ,  and 


Barnard  (1959)  noticed  that  Wu^ix^)  is  a  mixture  of  n 
normal  distributions.  However,  he  did  not  present  an 
explicit  expression  for  the  coefficients 
From  Proposition  4.2,  we  have 

Proposition  4.3 

EIP  lx?)  .  !  a'"'  . 

"  ^  k-1  ^  k+o‘^ 


•r’ 


I 

i-i 


I  A<">/(k.o- 

)c-n-l+l  ^ 


I  'sJ"’"  i  K 

k-1  k+o  ^  ^  k-1 


fn). 


1. 


Thus,  the  (one-sided)  Bayes  solution,  E(v^lX?),  Is  a  sample 

n  ‘  r 

dependent  weighted  average  of  the  observations  and  the 
prior  mean  0.  There  is  a  "shrinkage*  toward  the  prior  mean 
of  the  signal. 

(3)  The  number  of  operations  to  compute  a^,  given  Oj^,  k<n, 
is  0(n)  according  to  Proposition  4.2.  The  total  number  of 
operations  to  compute  for  all  n  is  O(T^). 


1 


{ 


(T-n  "  0,1, 


T-1?  k«0 


T-n) 


.17. 


=ij  - 


Thus  ^^(i+i)j*  0;^i<n^j<_T)  represents  the  conditional 
distribution  of  the  two  change  points  surrounding  time  n. 

So,  Pr(J^»lixJ)  can  be  computed  by 


priviixj)- 


n-l 

I  C 

k-0 


(k+X)n 


Xn  particular, 


Pr<No  change  in  tl,TlIxJ)  » 


can  be  used  to  test  whether  changes  have  ever  happened. 


(2) 


•  YL  C,  .  (S.*S.  ,)/(j-i+l+o'^) 


When 


C,./(j-i+l+ff“*) ,  1  .  k  <  T, 

l<l2ffn(n,k)  ~ 

■iax(n,k)^£T 


ex  n  .  .  .  f"'  ^  ^  A  I")  V  ^  I")  >  n 

SO,  Oedj  <4^  <  ...  <Vl  ‘^'*n  Vl’---  ’^‘*1 


and 


C  el. 

j«i+l+a  ^ 


Thus,  the  Bayes  estimate  E(UjjlXj)  is  a  sample  dependent 
weighted  average  of  the  observations  Xj^  and  the  prior  mean 
0,  and  the  weights  dj^^^  attain  their  maximum  at  k  «  n 
and  decrease  strictly  as  k  moves  away  from  n  on  either 
side. 

(3)  The  number  of  operations  required  to  compute 
®n'®T-n'^ij  OtT^)-  The  number  of 

operations  to  compute  ECu^^lX^)  is  0(n(T-n)).  So  the 

T 

total  number  of  operations  to  compute  E(Uj^lx^)  for  all 
n  Is  OW^). 

S.  An  Approximation  to  the  Bayes  Solution 

Harrison  and  Stevens  (1976)  proposed,  for  the 
filtering  problem,  an  approximation  technique  for  computing 
the  posterior  distributions  of  states  in  multl.process 
models.  Their  basic  idea  is  to  apply  the  following  step 
recursively  in  time.  First,  the  (estimated)  posterior 
distribution  of  the  state  at  time  t  is  approximated  by  a 
normal  distribution  with  the  same  first  two  moments.  Next, 
this  normal  approximation  is  used  together  with  the 


observation  at  t  +  1  to  estimate  the  posterior  distribution 
of  the  state  at  t  +  1. 

Applying  this  idea,  we  can  approximate  i (u  jx?)  as 

n  1 

follows.  Suppose  that  approximately  equals 

N(6n/Tj).  By  use  of  Proposition  4.1,  we  are  led  to ‘the 
following  recursion  with  initial  conditions 
and  =  (1  + 


satisfy  the  reversed-time  version  of  (5,1). 

Now,  we  extend  Harrison->stevens  approximation  to  the 
smoothing  case.  We  need  the  following  variation  of 
Proposition  4.6,  the  proof  of  which  is  similar  to  that 
of  Proposition  4.6. 


Proposition  5.1 


*  l(l-p)f  (rixj.,)+pf,  {r)l/f,  (*)  in  z,  2  <  n  <  T-1 
'^n+l  n+i  u  u  - 

Since  we  approximate  x”“^)  and  *n+l^  ^ 

and  *n+l*  '  respeetlvely,  we  are 

naturally  led  by  use  of  Proposition  5.1  to  the  following 
approximation  to  the  mean  of  KPj^IxJ). 


n  •  1,2,...,  T  -  I 

T 

Similarly,  suppose  that  approximately  equals 

N(u..,6^).  Since  the  system  is  time  reversible, 

n  n  n  n 


2  <  n  <  T-1 


where 


h<2>-  , 

i  -  1,  2 

»  3,  4 

<.(n) 

■=  1  g'f 

i»l  ^ 

-  <i-p)^p(e 

'  “n.l-  «n*l'  ’'n 

-  p(i-p)P(9^ 

-  p(l-p)P(0, 

“n+1'' 

*n*l'  ^n> 

2  2 
»  p  F(0,a'^, 

0,a^,  x„; 
n 

1 

and 


r(A,B,C,D,X)  -  (BD(l-o'^)  +  B  +  D) 


X  exp 


(X*AB~^->CD~^)^  1  3 

2(1+b"'-+d'^-o'^)  "  7*^  ® 


■*ch-h] 


RcmarKs: 

2 

(1)  When  9  <1  ,  it  C2UI  happen  with  a  very  snail  prob'- 

ability  that  1  +  '*’^n+l  ”  this  event 

happens,  the  right  side  of  the  formula  in  Proposition  5.1 

does  not  converge  to  0  a5  (zj  goes  to  •  where 

f,.  (*ix?“^)  and  f,,  (tlX  ^.)  are  replaced  by  the 

^n-1  ^  “n+l 


no  longer  a  good  approximation.  One  suggestion  is  to 
replace  (5.2)  by  when  1  +  ^n-1  *'  ^n+1  "  ^ 

(2)  The  nujnber  of  operations  required  to  compute  for 

all  n  is  0(T).  Hence,  the  computational  requirements  of 

are  much  smaller  than  those  of  the  exact  Bayes  solution. 

(3)  We  will  see  in  Section  ?  that  the  are  close  to 

E(g^lX^)  in  the  sense  of  mean  squared  error.  Thus  the 

are  nearly  optimal. 

(4)  It  is  not  clear  how  one  can  apply  Harrison-Stevens' 

idea  to  approximate  efficiently  the  conditional  change 

probabilities  Pr(Jj^*llX^)  and  Pr(No  change  I  * 

6.  E(u  !X,Ju  The  Estimates  of  u  Given  the  Chance 
”  — —  n  -  '  n 

Points 

In  this  section,  we  study  which  can  be 

used  to  see  how  much  additional  information  for  estimating 

is  obtained  from  the  knowledge  of  the  change  points. 

Define  [r^(J),  the  largest  integral 

interval  enntaining  n  which  contains  no  change 

(1  1  1  n  <  1  T)  .  Since  X  and  g  are 

Gaussian  conditional  on  J,  the  minimum  variance  estimate 

of  u  given  X  and  J  is  the  linear  estimate 
n  > 

•n*3' 

(6.1)  E(ii„lX,J)  -  f  X^/(«„(J1  -  r  (J)  +  1  +o'*) 

"  '  -  It-rnCJ)  ^  - 

and 


corresponding  normal  approximations.  Therefore,  (5.2>  is 


(6.2)  E((E(u„|X,J)  -  U^I^IJ)  -  +  1  +  "■*) 


% 


i 

I 

f 


and 

T 

(6.3)  AMSE(E(u  |x,  '))  -  i  f  Els  tJ]  -  r  (J)  +  1 
n  .  .  ^  n>l  "  '  "  ' 

The  following  proposition  gives  an  explicit  expression  for 
the  asymptotic  behavior  of  AMSE(E(u^ ;x,J) )  as  T  •  . 

The  proof  can  be  found  in  Yao  (1981). 

Proposition  6.1 

T 

AMSE(E(u  |X,J))  2T'^  I  E(E(u^|X,J)  -  u 
n  -  .  n  -  .  n 

-  P  -  ^(1-p)  ^  /  f_dx+o(l) 

0 

(T-^)  . 

7.  Comparison  Among  Four  Types  of  Estimates. 

In  this  section,  the  performance  of  u  »  u  >  E(u  iX) 
and  E('j^!x,J)  is  compared  in  terms  of  their  average  mean 
squared  errors  for  T  «  20  .  Sixty  cases  are  considered 
where  pc  (0.05,  0.1,  0.2,  0.4,  0.6,  0.8)  and 
ac  {0.3,  0.5,  1,  2,  3,  4,  5,  7,  10,  15)  . 

The  AMSB  of  is  calculated  from  Proposition  3.1 

while  that  cf  B(u^|X,J)  is  estimated  by  simulation  with 
2000  replications  for  each  one  of  the  60  cases. 

Since  E(G^-u„)^  -  !  X) -u„! ^  +  ElCjj-E(u„|X)  1^ , 


we  have 

(7.1)  AMSeC„)  -  AHSE{E(U|^|)())  +  AMSE  (Uj^-E  (y^^  |X)  ) 

-1  T  . 

w(iere  A.MSE(  j|^  -  E(uj^IX)  )  =  T  J  ^  ‘  •  ”'® 

*  n“l 

AMSE  of  Etu^lX)  and  u,^*E(u^|X)  are  estimated  by  simula¬ 
tion  with  400  replications  for  each  one  of  the  60  cases. 

The  AMSE  of  is  estimated  by  use  of  (7.1). 

The  simulation  results  are  summarized  in  Table  7.1 
and  Pigure  7.1  where  either  p  or  is  fixed. 

It  is  also  interesting  to  compare  with 

E(ujj|X)  for  small  p.  We  consider  six  cases  where  T  •  100, 
oc{0.01,  0.03,  0.05}  and  oc{l,3  }  .  Only 

Et  C  jju,-E(Ujj(,|X))^  ana  ElE(u^j,jlX)  are  eatlmatad. 

The  simulation  is  done  with  900  replications  for  each  one 
of  the  6  cases.  The  results  are  presented  in  Table  7.2. 
Remarks : 

(1)  It  can  be  shown  that  the  AMSE  of  u  and  E(u^|X,J) 

are  increasing  as  p  or  increases.  Sc  is  the  AMSE  of 

E(\^iX)  as  p  increases.  (See  Appendix  A. 3),  However, 
from  the  simulation  results,  it  appears  that  as 
increases,  AM5E(C(u^l X) )  first  increases  and  then  decreases 
and  eventually  approaches  AMSE(E( Ix,j) ) .  One  explanation 
is  that  when  is  large  enough  J  can  be  well  estimated 

from  X,  and  this  information  can  offset  the  loss  of  the 
relatively  small  prior  information  about  u^. 


i 

i 


(2)  Fron  the  simulation  results,  it  appears  that  E(Uj^lX) 
is  only  slightly  worse  than  E(u^iX,J)  in  every  case. 

However,  is  very  poor  when  is  moderately  large  and  p 
is  small.  Actually,  if  we  allow  T  *  •  and  fix  o^,  it  is  not 
difficult  to  show  that 


A«SE(U|j)  ' 

(p  .  0* 

A«SE(E(Pjj|X,J)  )  -  p  +  o(p) 

(p  *  0*: 

In  other  words,  is  very  inefficient  compared  with 
E(u,^iX,J)  when  p  is  small.  The  asymptotic  behavior  of 
AMSE(E(U^]X) }  as  p  -»  0*^  is  not  known. 

2 

(3)  Since  o  can  be  regarded  as  the  signal  to  noise  ratio, 

it  is  Interesting  to  consider  relative  mean  squared  error, 

i.e.  mean  squared  error  divided  by  the  energy  of  the  signal 

In  other  words,  o*^AHSE  is  used  to  replace  AMSE,  As  a 

natter  of  fact,  the  AMSE  for  the  case  that  i(u  )  " 

n 

M(0,o  )  and  «  N(0,1)  is  the  same  as  the  AMSE  for  the 

case  that  L(u„>  •  W(0,1)  and  Kc^)  ■  N(0,o’^).  Therefore, 
n  n 

it  is  not  hard  to  show  that  the  AMSE  of  u  and  C(u  Ix) 

n  n  •. 

and  E(u^|x,J)  are  decreasing  as  the  signal  to  noise  ratio 
0^  Increases.  (See  Appendix  A. 3) 


increases.  Equivalently,  the  larger  p  is,  the  better 

Harrison-Stever.s'  approximation  is.  AMSE(C^  ••  X))  is 

at  most  about  10  %  of  A.MSE(E(j  ix))  in  our  simulation  cases 

n  » 

Since  the  cost  of  computing  is  much  less  than  that  of 
E{j«IX),  it  may  be  desirable  to  substitute  for 
when  p  2  O.OS. 

(5)  According  to  Tables  7.2,  when  p  is  very  small,  say 
about  0.01  or  less,  is  no  longer  close  to  optimal.  In 
other  words,  the  more  complicated  approximation  is  prefer'^ 
able  for  small  p. 


(4)  It  appears  that  AMSE(u^'E(Uj^|X))  is  decreasing  ss  p 


-26- 
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A. 2  Proof  of  D  Being  Independent  of  n  in  Proposition  4.7 

From  the  recursive  definitions  of  the  a  and  6,  for 
1  s  n  <  n+1  s  T 


A. 3  Froof  of  Monotonicity  of  the  AMSe  of  Ely^jX)  and 

2 

X,J)  in  p  and/or  in  c  • 

Propos i t ion  A3 . 1 


2(i*o  *) 


is  increasing  as  p  increases  for  all  n. 


“n+1 


T-n-l 

Z 

)‘0 


Vi7(j^ 


1)CJ‘ 


^^+nn  -  y 


i.e. 


2  (1-p) 

i-1 


n-l 


Vl+(n- 


; 


i+l)c* 


-  Si-1> 


2(n-i+l+o^ 


j*n+l 


(1-p) 


^+Tj-n)c 


-  V 

2(j-n+o“^) 


i.e. 


n 

Z 

i-1 


T 

r 

j-n+l 


^n+l,j 


So* 


1  "  I  ^14 

IslsnsjsT  lsl<n+l<i<T  ^ 


□ 


Proof  of  Proposition  A3. I 

Throughout  the  proof,  p  and  q  {0<p<q<l)  ere  fixed. 
Without  loss  of  generality,  let  the  time  index  run  from  •?' 
to  T*  (T',  T*  0)  and  n  ■  0.  Denote  by  V(p)  (or  V(q)| 
the  M5E  of  the  minimum  variance  linear  estimate  at  time  0 
when  the  rate  of  change  is  p  (or  q,  respectively).  We 
want  to  show  v(p)  V(q). 

I<et  I  -T*  1  i  t*)  be  a  stationary  Gaussian  AR(I) 

2 

process  with  parameter  1-p  and  mean  0  variance  o  .  X,et 

J  i  i  1)  and  j  i  ^  1)  be  two  stationary  Gaussian 

AR(1)  processes  with  common  parameter  1-q  and  coimon  mean 
0  and  coimnon  variance  .  Let  a  and  B  be  i.i.d.  with 
Pr(a“i)  •  r(l-r)^“^,  1-1*2,,..  where  r-(q-p)/(l-p) .  Let 

-T  i  i  i  T*)  *  (c^  ;  1  ^  1)  and  (c^^;  i  1)  be 

three  independent  Gaussian  white  noises  with  common  variance 
1.  All  the  processes  (5^)  ,(c|>  .(cj)  .  6.  (cj)  * 

(c^}  and  {c^>  are  mutually  independent. 

For  1  bet%^en  -T  and  T  ,  define 


-  T 


<  1  <  T* 


i  -  0 

i  1  1 
i  <  -  1 


-B  <1  <9 

i  >  a 


i  < 


<•0  <  i  <  a 

otherwise 


estimate  of  Cq  based  on  is  ^  1  ^  T") . 

Also  note 

V(p)  -  EIE(Jj|x3’,)  -  Cj)* 

-  ElE(Ej(xJ^,a,S,(XM^.  )  -5,1* 

<  eib!5o!(x")!”,)  -Ejl* 

-  ElE{llpl  (X-)I*,)  -njl*<V(q) 

It  is  not  hard  to  see  that  V(p}  <  V(q>  except  for 

-  T*  -  0. 

The  only  thing  left  is  to  show  that  for  i  ^  j 

and 


(Piously  E  X^}  has  the  same  covariance  stnacture 

as  the  special  Bayesian  model  with  the  rate  of  change  equal 
to  p.  Suppose  that  X^}  has  the  sane  covariance 

structure  as  the  model  with  the  rate  of  change  equal  to  q. 
Since  X^}  is  Gaussian,  the  minimum  variance  linear 


“  *tj  * 

The  last  two  equations  can  be  easily  derived  from  the  first 
one. 


Now  consider  the  following  cases t 


(1)  0  <  1  <  j, 

•*■  E(ni,nj|i  <-i)Pr(:<a) 

.  a*  (1-q)  (I-  (1-t)  +04.(J* (V-p)  (l-rl  ^ 

(2)  i  <  j  <  0/  the  same  as  (1). 

(3)  i  <0  <j 

®<njLnj)  -  E(n^n^  lo^j)Pr(a^)+E(n^n^ia>  j,-6<i)  X 
Pr(o>j,-ff<i)  +  E(n^nj|  a  > J,-fl>i)Pr(a  >j,-8>t) 


Proposition  A3. 2 


EtE(Uj^!X)»v^r  is  increasing  as  p  increases  for  all  n. 


Proof  of  Proposition  A3. 2 

1hrou9hout  the  proof  ,0^,  p  and  q  (0<p<q<l)  are 
fixed.  Without  loss  of  generality,  let  the  tine  index  run 
from  -T'  to  T"  and  n  ■  0.  Also,  denote  by  V{p)  (or 
V(q))  the  M5E  of  the  Bayes  estimate  at  time  0  when  the 
rate  of  change  is  p  (or  q,  respectively).  Ke  want  to 
show  V(p)  ^  V(q) . 

Let  ^2*  ^3  <•••>  »»»<*  ^2*  ''3  ^ 

be  two  independent  discrete  time  Poisson  processes  with 
parameter  p.  in  other  words,  "  Cj^r..*p 

are  i.i.d.  with  Pr(^^-i)  •  p(l-p)^"^/  1*1,2 .  Let 

....  b«  i.i.d.  M(0.o*).  Define 


-  0  +  o’(l-p)^'‘(l-r)l(l-r)'^  ♦  0 


i  I 


*-k 


-tij  <  i  <{j 

-Vi  ‘  ^  i  -"k 


Let  ^  i^ere  (c^)  is  Gauseien  white  noise  with 

•  I.  The  processes  (Cj^)  , 


4 


mutually  independent.  Thus  -T'^i<T")  satisfies 

all  the  assumptions  of  the  special  Bayesian  model  with  the 
rate  of  change  equal  to  p« 

Now,  we  will  generate  a  system  independent  of  the 
previous  one.  Let  a^,  •***•  -By... 

be  mutually  independent  where 

Pr(aj-i)  -  Pr(6^-i)»r(l-r)‘"^ 

Pr  (“15+1*0)5*^)  ■Pf  1 -q <l-q)  ,k-l  ,2 , . . . ; i*l  ,2 , . . . 


Suppose  that  {uj  ,  XJr  -T’  1  i  <.  T*}  satisfies  all  the 

assumptions  of  the  model  with  the  rate  of  change  equal  to 

q.  Since  u"  ■  u*  , 

0  ® 

S[E(«g|X")-U(,]*  -  V(q) 

Using  the  independence  of  the  first  two  systems 
Vfp?  -  5fE(*;Q|X)-Ug)^ 

Let  %rhete  (cj)  is  Caussian  white  noise  ■  E[E(tiQ|Xpa^,9^, 

with  Ec!^  -  1.  The  processes  {a. J  ,  (6.)  ,  (Z.)  ,  and 

*  2 
{cp  are  mutually  Independent.  5.  XIX(Uq1x*)-uqI 

Now  we  define  the  third  system  in  terms  of  the  previous 

-  V(q) 


and 

r  ■  (q-p)/<l-p) . 

Let  Z_J^,  2j,  be  i.i.d.  »f(0,o^).  Define 

“i  ■  *0  '  '®1  ‘  I*®! 

'  *-)c  '  -»k.l  ‘  ^  i  -»k 


two.  Define 


It  is  easy  to  see  that  V(p)  <  V(q)  except  for  t'  -  T”  -  0. 

Now,  the  only  thing  left  is  to  show  {yt  ,  x^)  satisfies 
all  the  assumptions  of  the  model  with  the  rate  of  change  equal 
to  q.  Actually  we  need  only  show  that  CiiCg  *Ci>--' 
i.i.d.  with  Pr(t^-i)-q(l-q)^"^,  i=l,2,...  where- 

u  5  sup  (h!  <  Oj  (  U  {0) 


(l-p)^’^(l-r)^'^ 


^*")t*l‘')c*^''l . 


Proposition  A3 >3 


if  1  <  i  <  u 


Let  Cq  5  0.  For  every  k  0, 


E(E(u^lx,J)-Uj^l^  is  increasing  as  p  or 
increases  for  ell  n. 


Proof  of  Proposition  A3. 3 

2  2 

Throughout  the  proof  r  »  P^  and 

2  2  2 
(Oj^  <02  r  0<p^<p2<l)  are  fixed.  Denote  by  V{p»o  )  the 

NSE  of  the  estimate  at  time  n  with  known  change  points 

when  the  rate  of  change  is  p  and  the  variance  of  the  signal 

2 

is  o  .  We  want  to  show 


Pr  ttit+i-Chil- 15  (5  )t+i"5  x2.^  1 5i*”i  '  “I’^k* 


-»r  tmin  U,,+i .  »i>  1 5  i"®tP  < 


VCPj.Oj*)  <  Vlpj.o^*)  end  VCPj.oj*)  <  V(Pj,Oj*) . 
let  {Jj  :  1  i  i  1  T-1)  and  fJ^  !  1  i  1  1  *>• 

two  independent  Bernoulli  sequences  with  Pr(J^«l)»P2>l*Pr(J^«0) 
and  Pr(JJ-l)-pj/P2«l-Pr(JJ^-0) .  Let  Yj^,...pY^  be  i.i.d.  N(0pl 


,  and  (Uj'”) 


I 


% 


,(1) 


nil  -  +  Jn'’l’'n+1'  "-1 .2 . . . .  ,T-1 


.(2) 


,<3) 


“n+1 


,  n-l,2,...,T 


^  V;®lVl,n.l,2 . T-l 


E(IE(U^^*  1x‘^’  ,JJ')-u„'^’l^|JJ')iE(lE(Uj^'^’  |x‘^’  ,J) 


1*|J) 


E(IE(l*n'^*  1*'^’  <E((E(u^*^’  lx'*',j)-|i„'’’l*|j) 


Therefore# 


V(Pj,o^*)  <  VCpj.Oj^) 


Vtpj.a^^)  <  V(pj,aj^) 


Let  xi“’  -  ui®’  +  c„  a-l,2,3j  n-l,2,...,T  where  (c„} 

n  n  n  n 

is  Gaussian  white  noise  with  ■  1.  From  the  construction 

of  these  processes ,  we  have 

EtE(u^“’ ix'"’ -  V(Pj,0^^),  a  -  1,2 

E[E(u‘^’ |x'^’ ,  -  V(p  ,0,^1 

n  -  »  n  AX 

«^ere  JJ*  is  the  componentwise  product  of  ^  and 
From  (6.2) 

2  -2 
E{(B(y„|XpJ)-Unl  IJ)  -  (S^(J)-rjj(J)+l^o  , 


It  is  easy  to  see  that  v(Pj^,0j^^)  <  Vlp^,  unless  the 

time  span  T  ■  1.  O 

Proposition  A3. 4 

The  AS  WE  of  is  increasing  as  increases. 

This  proposition  can  be  easily  proved  by  use  of  the  results 
in  Yao  (1981) . 

Proposition  A3. 5 

The  AMSE  of  ECUjjlX)  and  E(u„!x,J)  are 

decreasing  as  increases. 

From  the  remarlc  (3)  in  section  7  one  can  readily 
see  that  Proposit'ion  A3.S  is  a  consequence  of  the  following 
lerana. 


we  can  easily  see  that 


Lemma  A3 « 1 

Let  lUj^i  1  <  n  <  T)  be  a  stochastic  signal  sequence 
with  finite  second  moments.  Let  the  observations 

X  *u  +  c,l<n<T,  where  <  is  i.i.d«  ) 

nnn—  —  T2' 

and  is  independent  of  '  ^hen  E(E(lIj^}X^)-Uj^)  is 

2 

increasing  as  increases. 

Proof  of  Lemma  A3.1 

Denote  by  the  USE  of  the  Bayes  estimate  of 

Uji  when  We  want  to  show  that  i 

for  0  ^  ^  * 

Let  {t  )  and  {t' }  be  two  mutually  independent 

n  n 

i.i.d.  Gaussian  sequences  with  common  mean  0  and  variances 

Oj^  and  Oj^  -  respectively.  {u„)  .  (£„) 

(e^)  are  mutually  independent.  Let  ”  “n  *  'n' 

X  "u  c  *  z  t  l<n<T.  Thus, 
n  n  n  n  — 

-  ElE(u„lX*)-u„l* 

-  B{EIu„|X^,(e')iI-u„  )* 

<  B{Etu„l<X')^l-u„}  -  V„(Oj*)  Q 


Table  7.1  The  AMSE  of  Four  Types  of  Estimates  as  Functions 
p  and  j  for  T  ■  20. 

(a)  0  -  .3 


'wre?'~-v! 

, 

.05  .1 

.2 

- 1 

.6 

.8 

Linear 

.0494 

.0584 

.0682 

.0768 

.0804 

.0821 

Bayes 

.0463 

(.0027) 

.0577 

(.0027) 

.0648 

(.0025) 

.0761 

(.0020) 

.0801 

(.0019) 

.0819 

(.0016) 

Wyswn 

Change 

Points 

.0415 

(.0002) 

.0481 

(.0002) 

.0581 

(.0002) 

.0694 

(.0001) 

.0756 

(.0001) 

.0798 

t.or'i) 

(H-S)- 

• 

Bav'os 

.0002 

(.0001) 

.0001 

(.0000) 

.0000 

(.0000) 

.0000 

(.0000) 

.0000 

(.0000) 

.0000 

(.CODC) 

(t 

)  0  • 

.5 

wSE"^ 

.05 

.1 

.2 

.4 

.6 

.8 

Linear 

.0B80 

.U19 

.1415 

.1732 

.1894 

.1975 

Bayes 

.0601 

(.0041) 

.1125 

(.0050) 

.1410 

(.0047) 

.1741 

(.0044) 

.1853 

(.0039) 

,1957 

(.0034) 

W^own 

Change 

Points 

.0622 

(.0004) 

.0793 

(.0005) 

.1062 

(.0005) 

.1425 

(.0004) 

.1669 

(.0003) 

.1858 

(.0002) 

(H-S)  - 

Bayes 

.0017 

(.0002) 

.0012 

(.0001) 

.0007 

(.0001) 

.0001 

(.0000) 

.0000 

(.0000) 

.0000 

(.0000) 

* 
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n  n  » 
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ABSTRACT 

Consider  the  problem  of  estimating,  in  a  Bayesian  framework  and  in  the 
presence  of  additive  Gaussian  noise,  a  signal  which  is  a  random  step  function 
The  best  linear  estimates,  the  Bayes  estimates  and  the  estimates  with  known 
change  points  are  derived,  evaluated  and  compared  analytically  and  numeri- 
callv.  A  characterization  of  the  Bayes  estimates  is  presented.  This  charac¬ 
terization  has  a  reasonable  interpretation  and  also  provides  a  way  ti''  compute 
the  Bayes  estimates  with  a  number  of  operations  of  the  order  of  where  1  is 
the  fixed  time  span.  An  approximation  to  the  Bayes  estimates  is  proposed 
which  is  reasonably  good  and  reduces  the  total  number  of  operations  tn  the 
order  of  T. 
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